An Immune System Paradigm for the Assurance of Dependability of Collaborative Self-organizing Systems
نویسنده
چکیده
In collaborative self-organizing computing systems a complex task is performed by relatively simple autonomous agents that act without centralized control. Disruption of a task can be caused by agents that produce harmful outputs due to internal failures or due to maliciously introduced alterations of their functions. The probability of such harmful outputs is minimized by the application of a design principle called ”the immune system paradigm” that provides individual agents with an all-hardware fault tolerance infrastructure. The paradigm and its application are described in this paper. 1 Dependability Issues of Collaborative Self-Organizing Systems Self-organizing computing systems can be considered to be a class of distributed computing systems. To assure the dependability of conventional distributed systems, fault tolerance techniques are employed [1]. Individual elements of the distributed system are grouped into clusters, and consensus algorithms are implemented by members of the cluster [2], or mutual diagnosis is carried out within the cluster. Self-organizing systems differ from conventional distributed systems in that their structure is dynamic [3]. Relatively simple autonomous agents act without central control in jointly carrying out a complex task. The dynamic nature of such systems makes the implementation of consensus or mutual diagnosis impractical, since constant membership of the clusters of agents cannot be assured as the system evolves. An agent that suffers an internal fault or external interference may fail and produce harmful outputs that disrupt the task being carried out by the collaborative system. Even more harmful can be maliciously introduced (by intrusion or by malicious software) alterations of the agent’s function that lead to deliberately harmful outputs. 2 Algirdas Avižienis The biological analogy of the fault or interference that affects an agent is an infection that can lead to loss of the agent’s functions and also to transmission of the infection to other agents that receive the harmful outputs, possibly causing an epidemic. The biologically inspired solution that I have proposed is the introduction within the agent of a fault tolerance mechanism, called the fault tolerance infrastructure (FTI), that is analogous to the immune system of a human being [4,5]. Every agent has its own FTI and therefore consensus algorithms are no longer necessary to protect the system. 2 A Design Principle: the Immune System Paradigm My objective is to design the FTI for an autonomous agent that is part of a selforganizing system. I assume that the agent is composed of both hardware and software subsystems and communicates to other agents via wireless links. Then I will employ the following three analogies to derive a design principle called “the immune system paradigm”: (1) the human body is analogous to hardware, (2) consciousness is analogous to software, (3) the immune system of the body is analogous to the fault tolerance infrastructure FTI. In the determination of the properties that the FTI must possess four fundamental attributes of the immune system are especially relevant [6]: (1) It is a part of the body that functions (i.e. detects and reacts to threats) continuously and autonomously, independently of consciousness. (2) Its elements (lymph nodes, other lymphoid organs, lymphocytes) are distributed throughout the body, serving all its organs. (3) It has its own communication links – the network of lymphatic vessels. (4) Its elements (cells, organs, and vessels) themselves are self-defended, redundant and in several cases diverse. Now we can identify the properties that the FTI must have in order to justify the immune system analogy. They are as follows: (1a) The FTI consists of hardware and firmware elements only. (1b) The FTI is independent of (that is, it requires no support from) any software of the agent, but can communicate with it. (1c) The FTI supports (provides protected decision algorithms for) multichannel computing by the agent, including diverse hardware and software channels that provide design fault tolerance for the agent’s hardware and software. (2) The FTI is compatible with (i.e., protects) a wide range of the agent’s hardware components, including processors, memories, supporting chipsets, discs, power supplies, fans and various peripherals. (3) Elements of the FTI are distributed throughout the agent’s hardware and are interconnected by their own autonomous communication links. (4) The FTI is fully fault-tolerant itself and requires no external support. It is not susceptible to attacks by intrusion or malicious software and is not affected by natural or design faults of the agent’s hardware and software. An Immune System Paradigm for the Assurance of Dependability of Collaborative Self-organizing Systems 3 (5) An additional essential requirement is that the FTI provides status outputs to those other agents with which it can communicate. The outputs indicate the state of the agent’s health: perfect or undergoing recovery action. Upon failure of the agent’s function the FTI shuts down all its outputs and issues a permanent status output indicating failure. The above listed set of design requirements is called the immune system paradigm. It defines an FTI that can be considered to be the agent’s immune system that defends its “body” (i.e., hardware) against “infections” caused by internal faults, external interference, intrusions, and attacks by malicious software. The FTI also informs the other agents in its environment of its state of health. Such an FTI is generic, that is, it can serve a variety of agents. Furthermore it is transparent to the agent’s software, compatible with other defenses used by the agent, and fully selfprotected by fault tolerance. A different and independently devised analogy of the immune system is the “Artificial Immune System” (AIS) of S. Forrest and S. A. Hofmeyr [7]. Its origins are in computer security research, where the motivating objective was protection against illegal intrusions. The analogy of the body is a local-area broadcast network, and the AIS protects it by detecting connections that are not normally observed on the LAN. Immune responses are not included in the model of the AIS, while they are the essence of the FTI. 3 Architecture of the Fault Tolerance Infrastructure The preceding sections have presented a general discussion of an FTI that serves as the analog of an immune system for the hardware of an agent of a self-organizing system. Such an FTI can be placed on a single hardware component, or it can be used to protect a board with several components, or an entire chassis [5]. To demonstrate that the FTI is a practically implementable and rather simple hardware structure, this and the next section describe an FTI design that was intended to protect a system composed of Intel P6 processors and associated chip sets and was first presented in [5]. The FTI is a system composed of four types of special-purpose controllers called ”nodes”. The nodes are ASICs (Application-Specific Integrated Circuits) that are controlled by hard-wired sequencers or by read-only microcode. The basic structure of the FTI is shown in Figure 1. The figure does not show the redundant nodes needed for fault tolerance of the FTI itself. The C (Computing) node is a COTS processor or other hardware component of the agent being protected by the FTI. One A (Adapter) node is provided for each C node. All error signal outputs and recovery command inputs of the C node are connected to its A node. Within the FTI, all A nodes are connected to one M (Monitor) node via the M (Monitor) bus. Each A node also has a direct input (the A line) to the M node. The A nodes convey the C node error messages to the M node. They also receive recovery commands from the M node and issue them to C node inputs. The A line serves to request M node attention for an incoming error message. The M node stores in ROM the responses to error signals from every type of C node 4 Algirdas Avižienis and the sequences for its own recovery. It also stores system configuration and system time data and its own activity records. The M node is connected to the S3 (Startup, Shutdown, Survival) node. The functions of the S3 node are to control power-on and power-off sequences for the entire agent, to generate fault-tolerant clock signals and to provide non-volatile, radiation-hardened storage for system time and configuration. The S3 node has a backup power supply (e.g. a battery) and remains on at all times during the life of the FTI. The D (Decision) node provides fault-tolerant comparison and voting services for the C nodes, including decision algorithms for N-version software executing on diverse processors (C-nodes). Fast response of the D node is assured by hardware implementation of the decision algorithms. The D node also keeps a log of disagreements in the decisions. The second function of the D node is to serve as a communication link between the software of the C nodes and the M node. C nodes may request configuration and M node activity data or send power control commands. The D node has a built-in A node (the A port) that links it to the M node.Another function of the FTI is to provide fault tolerant power management for the entire agent system, including individual power switches for every C node, as shown in Figure 1. Every node except the S3 has a power switch. The FTI has its own fault-tolerant power supply (IP).
منابع مشابه
On Guaranteeing Global Dependability Properties in Collaborative Business Process Management
The Service-Oriented Architecture (SOA) paradigm supports a collaborative business model, where business applications are built from independently developed services, and services and applications build up complex dependencies. Guaranteeing high dependability levels in such complex environment is a key factor for the success of this model. In this chapter we discuss issues concerning the design...
متن کاملAn Immune System Paradigm for the Design of Fault Tolerant Systems
An in-depth assessment of the implementation of fault tolerance in contemporary " off-the-shelf " computing systems [1] leads us to conclude that hardware defenses are not adequately exploited for the assurance of dependability. In the search for a fundamentally better solution we have looked at the self-protection (i.e., fault tolerance) mechanisms of the human being. We use two analogies [2]:...
متن کاملتئوری پیچیدگی و رویکرد کلاژیسم در سیستم های ژئومورفیک
Complexity Theory and Collagist Approach in Geomorphic Systems Introduction Now, scientists know the world as complex systems that predict consequences of it is so difficult. In this situation, the systems operated by rotation manner in which chaos is order and discipline leads to chaos. Nowadays, simple idea of how the world work change and convert to complex and paradoxical idea. This i...
متن کاملUncertainty Modeling of a Group Tourism Recommendation System Based on Pearson Similarity Criteria, Bayesian Network and Self-Organizing Map Clustering Algorithm
Group tourism is one of the most important tasks in tourist recommender systems. These systems, despite of the potential contradictions among the group's tastes, seek to provide joint suggestions to all members of the group, and propose recommendations that would allow the satisfaction of a group of users rather than individual user satisfaction. Another issue that has received less attention i...
متن کاملBeyond Swarm Intelligence: Building Self-Managing Systems Based on Pollination
Nature exhibits a fruitful inspiration source for building self-managing systems. The human body’s autonomous nervous system, its reflex and healing system, or its immune system inspired the building of self-managing systems in the same way as biological systems as ant, termite, or bee colonies. Thereby, self-managing systems based on such biological, self-organizing systems mostly rely on Swar...
متن کامل